
# Spatial Aggregations of PM2.5 Estimates from Washington University St. Louis Satellite-derived PM2.5

Spatial aggregations of PM2.5 estimates as generated by the [Atmospheric Composition Analysis Group](https://sites.wustl.edu/acag/datasets/surface-pm2-5/). The spatial aggregations are performed for satellite PM2.5 from grid/raster (NetCDF) to polygons (SHP). These aggregations are performed on V5.GL.04 files for North America only.

---

## Source Data

### Washington University PM2.5

The [Atmospheric Composition Analysis Group](https://sites.wustl.edu/acag/datasets/surface-pm2-5/) uses a combination of satellite images, monitors, and simulation to generate estimates of PM2.5. Estimates are stored in NetCDF files and made publicly available. There are several versions of the estimates.

The version [V5.GL.04](https://sites.wustl.edu/acag/datasets/surface-pm2-5/#V5.GL.04) consists of mean PM2.5 (µg/m³) available at:

- **Temporal frequency**: Yearly and monthly  
- **Grid resolutions**: 0.1° × 0.1° and 0.01° × 0.01°  
- **Geographic regions**: North America, Europe, Asia, and Global

---

### Shapefiles

Cartographic boundary shapefiles used for these aggregations were downloaded from the [US Census Bureau](https://www.census.gov/geographies/mapping-files/time-series/geo/cartographic-boundary.html).

#### County

> Note: For data before 2013, 2013 shapefiles were used.

- [2013](https://www2.census.gov/geo/tiger/GENZ2013/cb_2013_us_county_500k.zip)
- [2014](https://www2.census.gov/geo/tiger/GENZ2014/shp/cb_2014_us_county_500k.zip)
- [2015](https://www2.census.gov/geo/tiger/GENZ2015/shp/cb_2015_us_county_500k.zip)
- [2016](https://www2.census.gov/geo/tiger/GENZ2016/shp/cb_2016_us_county_500k.zip)
- [2017](https://www2.census.gov/geo/tiger/GENZ2017/shp/cb_2017_us_county_500k.zip)
- [2018](https://www2.census.gov/geo/tiger/GENZ2018/shp/cb_2018_us_county_500k.zip)
- [2019](https://www2.census.gov/geo/tiger/GENZ2019/shp/cb_2019_us_county_500k.zip)
- [2020](https://www2.census.gov/geo/tiger/GENZ2020/shp/cb_2020_us_county_500k.zip)
- [2021](https://www2.census.gov/geo/tiger/GENZ2021/shp/cb_2021_us_county_500k.zip)
- [2022](https://www2.census.gov/geo/tiger/GENZ2022/shp/cb_2022_us_county_500k.zip)

**More info:**

- [County shapefiles (cartographic boundaries)](https://www.census.gov/programs-surveys/geography/guidance/tiger-data-products-guide.html)
- [2013 ReadMe](https://www2.census.gov/geo/tiger/GENZ2013/2013_file_name_def.pdf)

#### ZCTA

> Note: For data before 2000, 2000 shapefiles were used.

- [2000](https://www2.census.gov/geo/tiger/GENZ2010/gz_2010_us_860_00_500k.zip)
- [2010](https://www2.census.gov/geo/tiger/GENZ2019/shp/cb_2019_us_zcta510_500k.zip)
- [2020](https://www2.census.gov/geo/tiger/GENZ2020/shp/cb_2020_us_zcta520_500k.zip)

**More info:**

- [ZCTA shapefiles (cartographic boundaries)](https://www.census.gov/programs-surveys/geography/guidance/tiger-data-products-guide.html)
- [2010 ReadMe](https://www2.census.gov/geo/tiger/GENZ2010/ReadMe.pdf)

---

## Dataset Details

Each uploaded file contains spatial aggregations of PM2.5 estimates for North America for a given year, spatial resolution, and temporal frequency.

- **Spatial Coverage**: North America
- **Spatial Resolution**: County, ZCTA
- **Temporal Coverage**: 1998–2022
- **Temporal Resolution**: Monthly, Yearly
- **File Type**: `.parquet` – Apache Parquet is an open-source, columnar data format optimized for efficient storage, compression, and retrieval of large datasets. It supports both batch and interactive workloads, similar to other Hadoop columnar formats like RCFile and ORC.[Learn more](https://www.databricks.com/glossary/what-is-parquet)

---

### Codebook

#### ZCTA Monthly

| Property | Description                             |
| -------- | --------------------------------------- |
| `Zcta`   | 5-digit ZIP Code Tabulation Area (ZCTA) |
| `Month`  | Month of the PM2.5 measurement          |
| `Year`   | Year of the PM2.5 measurement           |
| `Pm25`   | PM2.5 concentration (µg/m³)             |


#### ZCTA Yearly

| Property | Description                             |
| -------- | --------------------------------------- |
| `Zcta`   | 5-digit ZIP Code Tabulation Area (ZCTA) |
| `Year`   | Year of the PM2.5 measurement           |
| `Pm25`   | PM2.5 concentration (µg/m³)             |


#### County Monthly

| Property | Description                    |
| -------- | ------------------------------ |
| `County` | County FIPS code               |
| `Month`  | Month of the PM2.5 measurement |
| `Year`   | Year of the PM2.5 measurement  |
| `Pm25`   | PM2.5 concentration (µg/m³)    |


#### County Yearly

| Property | Description                   |
| -------- | ----------------------------- |
| `County` | County FIPS code              |
| `Year`   | Year of the PM2.5 measurement |
| `Pm25`   | PM2.5 concentration (µg/m³)   |


---

## Technical Details

Code used to produce this dataset can be found in:  
🔗 https://github.com/NSAPH-Data-Processing/pm25_washu_raster2polygon

This repository contains code written in Python and is orchestrated using **Snakemake**.

### Dependencies

The following dependencies were used to generate these aggregations: 

```yaml
dependencies:
  - python=3.11
  - netcdf4=1.6.5
  - xarray=2023.12.0
  - rasterio=1.3.9
  - rasterstats=0.19.0
  - geopandas=0.14.2
  - pyarrow=14.0.2
  - pip=23.3.2
  - pip:
    - requests==2.31.0
    - wget==3.2
    - hydra-core==1.3.2
    - snakemake==8.1.2
    - selenium==4.29.0
    - chromedriver-binary==135.0.7030.0.0
    - tqdm==4.67.1
    - torch==2.6.0
    - torchaudio==2.6.0
    - torchvision==0.21.0
```

